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© Construction of DNA sequencer and their use for microbial production of proteins, In particular human serum albumin. 



@ By means of reverse transcription of mRNA coding 
for a desired polypeptide, there is obtained a set of 
overlapping fragments of duplex cDNA, which together 
correspond to the whole mRNA molecule. The fragments 
have overlapping regions bearing sites for restriction 
enzymes, such that cutting and ligation gives DNA 
corresponding to the polypeptide. This Is introduced Into 
a vector In reading frame with a promoter. Transforma- 
tion of a microorganism enables expression of the poiy- 
04 Peptide. 

The construction via fragments enables large mo- 
^* I ecu I es to be made. Thus, human serum albumin (HSA) 
Is produced by E^ coll transformed with plasmld pHSAL 
(0 This Includes DNA made from fragments derived from 
^ reverse transcription of mRNA from human liver. 

(D 
CO 

r* 
o 

o - 

o. 
Ill 



ACTORUM AG 



BEST AVAILABLE COPY 



-1- 



0073646 



CONSTRUCTION OF DNA SEQUENCES AND THEIR USE 
FOR MICROBIAL PRODUCTION OF PROTEINS , IN 
PARTICULAR HUMAN SERUM ALBUMIN 



This invention relates to recombinant DNA technology. 
It particularly relates to the application of the technology/ 
to the production of human serum albumin (HSA) in micro- 
organisms for use in the therapeutic treatment of humans. 
In one aspect the invention relates to a technique for 
producing DNA sequences encoding desired polypeptides. In 
another aspect it relates to the construction of microbial 
expression vehicles containing DNA sequences encoding a 
protein, e.g. human serum albumin or the biologically active 
component thereof operably linked to expression effecting 
promoter systems and to the expression vehicles so 
constructed. In another aspect, the present invention 
relates to microorganisms transformed with 
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such expression vehicles, thus directed in the expression of the DNA 
sequences referred to above. In yet other aspects, this invention 
relates to the means and methods of converting the end product of such 
expression to entities, such as phannaceutical compositions, useful for 
the therapeutic treatment of humans. In preferred embodiments, this 
invention provides for particular expression vectors that are sequenced 
properly such that mature human serum albumin is produced directly. 

In one aspect, the present invention is particularly directed to a method of 
10 preparing cDNA encoding polypeptides or biologically active portions 

thereof. This aspect provides the means and methods of utilizing 

synthetic primer DMA corresponding to a portion of the cftNA of the 

Intended polypeptide, adjacent to a known endonuclease restriction site. 

in order to obtain by reverse transcription a series of DNA fragments 
15 encoding sequences of the polypeptide. These fragments are prepared such 

that the entire desired protein coding sequence 1s represented, the 
individual fragments containing overlapping DNA sequences harboring 

common endonuclease restriction sites within the corresponding 
overlapping sequence. This aspect facilitates the selective cleavage and 
20 ligation of the respective fragments so as to assemble the entire cDNA 
sequence encoding the polypeptide 1n proper reading frame. This 
discovery permits the obtention of cDNA of high molecular weight proteins 
which otherwise may not be available through use of usual reverse 



25 



transcriptase methods and/or chemical synthesis. 
1 



The publications and other materials hereof used to Illuminate the 
background of the Invention, and in particular cases, to provide 
additional details respecting its practice are incorporated herein by 
reference, and for convenience, are numerically referenced in the 
30 following text and grouped in the appended bibliography. 
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(A) Human Serum Albumin 

Human serum albumin (MSA) is the major protein species in adult 
serum. It' 1s produced in the liver and is largely responsible 
for maintaining normal osmolarity in the bloodstream and 
functions as a carrier for numerous serum molecules (1, 2). 
The apparent fetal counterpart of HSA 1s a-fetoprote1n and 
studies have been undertaken to compare the two as well as rat 
serum albumin and a- fetoprotein (3-8). The complete protein, 
sequence of HSA has been published (9-12). The published 
protein sequences of HSA disagree in about 20 residues as well 
as in the total number of amino acids in the mature protein 
[584 amino acids (9); 585 (12)]. Some evidence suggests that 
HSA Is Initially synthesized as a precursor molecule (13,14) 
containing a "prepro" sequence. The precursor forms of bovine 
(15) and rat (16) serum albumin have also been sequenced. 

The role or rationale for the use of albumin in therapeutic 
application is for the treatment of hypovolemia, 
hypoprotelneraia and shock. Albumin currently is used to 
improve the plasma oncotic (colloid osmotic) pressure, caused 
by solutes (colloids) which are not able to pass through 
capillary pores. Inasmuch as albumin has a low permeability 
constant, it essentially confines itself to the intravascular 
compartment. When different concentrations of nondif fusabl e 
particles exist on opposite sides of the cell membrane, water 
crosses the partition until the concentrations of particles are 
equal on both sides. In this process of osmosis, albumin plays 
a vital role in maintaining the liquid content in blood. 
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Thus. the therapeutic benefits of albumin administration reside 
primarily for the treatment of conditions where there is a loss 
of liquid from the intravascular compartment, such as in 
surgical operations, shock, burns, and hypoproteinemia 
5 resulting in edema. Albumin is also used for diagnostic 

applications in which its nonspecific ability to bind to other 
proteins makes it useful in various diagnostic solutions. 

Presently, human serum albumin is produced from whole blood 
10 fractionation techniques, and thus is not available in large 

amounts at competitive costs. The application of recombinant 
DHA technology makes possible the production of copious amounts 
of human serum albumin by use of genetically directing 
microorganisms to produce it efficiently. The present 
IS invention nay enabl e the availability of purified HSA 

produced through recombinant DMA technology more abundantly and 
at lower cost than is now presently possible. The present 
invention also provides knowledge of the DNA sequence 
organization of human serum albumin and its deduced amino acid 
20 sequence, helping to elucidate the evolutionary, regulatory. 

and functional "properties of human serum albumin as well as its 
related proteins such as alpha-f etoprotein. 

More particularly, present invention provides for the isolation 
25 of cDHA clones spanning the entire sequence of the protein 

coding and 3' untranslated portions of HSA mRNA. These cDNA 
clones were used to construct a recombinant expression vehicle 
which directed the expression in a microorganism strain of the 
mature HSA protein under control of the trp promoter. The 
30 present invention also provides the complete nucleotide and 

deduced amino acid sequence of HSA. 
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Reference herein to the expression of "mature human serum 
albumin" connotes the microbial production of human serum 
albumin unaccompanied by the presequence ( M prepro M ) that 
immediately attends translation of the human serum albumin 
mRNA. Mature human serum albumin, according to the present 
invention, is Immediately expressed from a translation start 
signal (ATG), which also encodes the amino acid methionine 
linked to the first amino acfd of albumin. This methionine 
amino acid can be naturally cleaved by the microorganism so as 
to prepare the human serum albumin directly. Mature human 
serum albumin can be expressed together with a conjugated 
protein other than the conventional leader, the conjugate being 
specifically cleavable in an intra- or extracellular 
environment. See British patent publication number 2007676A. 
Finally, the mature human serum albumin can be produced in 
conjunction with a microbial signal polypeptide which 
transports the conjugate to the cell wall, where the signal is 
processed away and the mature human serum albumin secreted. 



20 (B) Recombinant DNA Technology 



With the advent of recombinant DNA technology, the controlled 
microbial production of an enormous variety of useful 
polypeptides has become possible. Many mammalian polypeptides, 
such as human growth hormone and human and hybrid leukocyte-, 
interferons, have already been produced by various 
microorganisms. The power of the technology admits the 
microbial production of an enormous variety of useful 
polypeptides, putting within reach the microbially directed 
manufacture of hormones, enzymes, antibodies, and vaccines 
useful for a variety of drug-targeting applications. 



25 



30 



0073646 



A basic element of recombinant ONA technology is the plasmid. 
an extrachromosoraal loop of doubl e -stranded DNA found in 
bacteria oftentimes in multiple copies per cell. Included in 
the information encoded in the plasmid DNA is that required to 
5 reproduce the plasmid in daughter cells (i.e.. a "repHcon" or 

origin of replication) and ordinarily, one or more phenotypic 
selection characteristics, such as resistance to antibiotics, 
which permit clones of the host cell containing the plasmid of 
interest to be recognized and preferentially grown in selective 
10 media. The utility of bacterial plasmids lies in the fact that 

they can be specifically cleaved by one or another restriction 
endonuclease or "restriction enzyme", each of which recognizes 
a different site on the plasmid DNA. Thereafter heterologous 
genes or gene fragments may be inserted into the plasmid by 
15 endwise joining at the cleavage site or at reconstructed ends 

adjacent to the cleavage site. (As used herein, the term 
"heterologous" refers to a gene not ordinarily found in, or a 
polypeptide sequence ordinarily not produced by, a given 
microorganism, whereas the term "homologous" refers to a gene 
20 or polypeptide which is found in. or produced by the 

corresponding wild-type microorganism.) Thus formed are 
so-called replicable expression vehicles- 

DNA recombination is performed outside the microorganism, and 
25 the resulting " recombinant" replicable expression vehicle, or 

plasmid. can be introduced into microorganisms by a process 
known as transformation and large quantities of the 
heterologous gene-containing recombinant vehicle obtained by 
growing the transf ormant. Moreover, where the gene is properly 
30 inserted with reference to portions of the plasmid which govern 

the transcription and translation of the encoded DNA message, 
the resulting expression vehicle can be used to actually 



produce the polypeptide sequence for which the inserted gene 
codes, a process referred to as expression. 



Expression is initiated in a DNA region known as the promoter. 
In the transcription phase of expression, the DMA unwinds, 
exposing the sense coding strand of the ONA as a template for 
initiated synthesis of messenger RNA from the 5' to 3' end of 
the entire ONA sequence. The messenger RNA is, in turn, bound 
by ribosomes, where the messenger RNA is translated into a 
polypeptide chain having the amino acid sequence for which the 
DNA codes. Each amino acid is encoded by a nucleotide triplet 
or "codon" which collectively make up the "structural gene", 
i.e.. that part of the DNA sequence which encodes the amino 
acid sequence of the expressed polypeptide product. 

Translation is initiated at a "start" signal (ordinarily ATG, 
which in the resulting messenger RNA becomes AUG). So-called 
stop codons, transcribed at the end of the structural gene, 
signal the end of translation and, hence, the production of 
further amino acid units. The resulting product may be 
obtained by lysing the host cell and recovering the product by 
appropriate purification from other proteins. 

In practice, the use of recombinant DNA technology can express 
entirely heterologous polypeptides - so-called direct 
expression - or alternatively may express a heterologous 
polypeptide, fused to a portion of the amino acid sequence of a 
homologous polypeptide. In the latter cases, the intended 
bioactive product is rendered bioinactive within the fused, 
homologous/heterologous polypeptide until it is cleaved in an 
extracellular environment. See Wetzel, American Scientist 68. 
664 (1980). 
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If recombinant DMA technology is to fully sustain its promise, 
systems must be devised which optimize expression of gene 
inserts, so that the intended polypeptide products can be made 
available in controlled environments and in high yields. 

(C) State of the Art 

Sargent et aK, in Proc. Mat!. Acad. Sci_. (USA) 78, E43 (1981), 
describe the cloning of rat serum albumin messenger RNA in a 
series of recombinant DMA plasmids. This was done to determine 
the nucleotide sequences of the clones in order to study the 
evolutionary hypothesis of the protein product. Thus, these 
workers made no attempt to assemble the cDHA fragments they 
prepared. 

In Journal of Supramolecul ar Structure and Cellular 
Biochemistry. Supplement 5 t 1981, Alan R. Liss, Inc. NY, 
Dugaiczyk et aK report, in abstract form, their studies of the 
human gene for human serum albumin. They obtained cONA 
fragments but there is no evidence that 'these workers cloned or 
produced the fragments for any purpose other than for stuoying 
the basic molecular biology of the a-fetoprotein and serum 
albumin genes. 



The present invention is based upon the discovery that recombinant DNA 
technology can be used to successfully and efficiently produce human 
serum albumin in direct form. The product is suitable for use in 
30 therapeutic treatment of human beings in need of supplementation of 

albumin. The product is produced by genetically directed microorganisms 
and thus the potential exists to prepare and isolate HSA in a more 
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efficient manner than is presently possible by blood fractionation 
techniques. It is noteworthy that we have 
succeeded in of genetically directing a microorganism to produce a 
protein of enormous length — 584 amino acids corresponding to an mRNA 
5 transcript upwards of about 2.&00 bases. 

The present invention comprises the human serum albumin thus produced and 
the means and methods of its production. The present invention is 
further directed to replicable DHA expression vehicles harboring gene 

10 sequences encoding HSA in directly expressible form. Further, the 

present invention is directed to microorganism strains transformed with 
the expression vehicles described above and to microbial cultures of such 
transformed strains, capable of producing HSA, In .still further aspects, 
the present invention is directed to various processes useful for 

15 preparing said HSA gene sequences, DNA expression vehicles, microorganism 
strains and cultures and"' to specific embodiments thereof. Still further, 
this invention is directed to the preparation of cDNA sequences encoding 
polypeptides which are heterologous to the microorganism host, such as 
human serum albumin, utilizing synthetic DMA primer sequences 

20 corresponding in sequence to regions adjacent to known restriction 
endonuclease sites, such that Individual fragments of cDNA can be 
prepared which overlap in the regions encoding the common restriction 
endonuclease sites. This embodiment enables the precise cleavage and 
ligation of the fragments so as to prepare the properly encoded DNA 

25 sequence for the intended polypeptide. 



The work described herein involved the expression of human serum albumin 
30 (HSA) as a representative polypeptide which Is heterologous to the 

microorganism employed as host. Likewise the work described involved use 
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of the microorganism E. coH K-12 strain 294 (end A, thi". hsr*. 
k hsm + ), as described in British Patent Publication Ho. 2055382 A. 
This strain has been deposited with the American Type Culture Collection, 
ATCC Accession No. 31446. 

The invention, in its most preferred embodiments, is described with 
reference to E. coli, including not only strain E. coli K-12 strain 294, 
defined above, but also other known E. coll strains such as E. coll B. 
E. coli x 1776 and JE. coHH 3110. or other microbial strains many of 
10 which are deposited and (potentially) available from recognized 

microorganism depository institutions, such as the American Type Culture 
Collection (ATCC)— cf. the ATCC catalogue .listing. See also German 
Offenlegungsschrift 2644432. These other microorganisms include, for 
example, Bacflli such as Villus subtil is and other enterobacteri aceae 
15 among which can be mentioned as examples Salmonella typhimurium and 

Ssrralia marcesans. utilizing plasnrids that can replicate and express 
heterologous gene sequences therein. Yeast, such as Saccharomyces 
cerevisiae . may also be employed to advantage as host organism in the 
preparation of the interferon proteins hereof by expression of genes 
20 coding therefor under the control of a yeast promoter. (See the 

copending U.S. patent application of Hitzeman et aU. filed February 25. 
1901 (Attorney Docket Ho. 100/43). assignee Genentech, Inc. et al_. , or the 
corresponding European Application 82300949.3 which are 
incorporated herein by reference. 
25 Preferred embodiments of the invention will now be described 
with reference to the accompanying drawings in which: 

Figs. 1A and B are diagrams for use in explaining the 
construction of plasmid pHSAl? 

Fig. 2 shows the immunoprecipitation of bacterially 
30 synthesised HSA; and 

Fig. 3 shows the amino acid sequence of HSA and the 
corresponding DNA sequence- 



0073646 



-11- 



In Fig.lA, the top line represents the mRNA coding for the human serum 
albumin protein and below It the regions contained in the cDNA 
clones F-47, F-61 and B-44 described further herein. The 
Initial and final amino acid codons of the mature HSA mRNA are 
5 indicated by circled 1 and 585 respectively. Restriction 

endonuclease sites involved in the construction of pHSAl are 
shown by vertical lines. An approximate size scale 1n 
nucleotides Is included. 

10 Tie completed plasmid pHSAl is shown in Fig.lB, with HSA coding regions 

derived from cDNA clones shaded as in A). Selected restriction 
sites and terminal codons number 1 and 585 are Indicated as 
above. The E. coif trp promoter-operator region Is shown with 
an arrow representing the direction of transcription. G:C 

15 denotes an ollgo dG:dC tail. The leftmost Xbal site and the 

initiation codon ATG were added synthetically. The 
tetracycline (Tc) and ampfcillin (Ap) resistance genes in the 
pBR322 portion of pHSAl are indicated by a heavy line. 

20 Figure 2 depicts the fmmunoprecipl tatlon of bacterially synthesized HSA. 

£• coll cells transformed with albumin expression plasmid pHSAl 
{lanes 4 and 5) or control plasmid pLeIFA25 (containing an 
Interferon a gerte in the identical expression vehicle; lanes 2, 3 
25 and 7) were grown in 35 S-methion1ne-suppl emented media. Samples 

in lanes 2, 4 and 7 were Induced for expression from the trp 
promoter 1n M9 media lacking tryptophan; samples in lanes 3 and 5 
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were grown in tryptophan-containing LD broth to repress the trp 
promoter. Each sample lane of the autoradiograph of the 
SDS-polyacryl amide gel presented here contains labeled protein 
innunoprecipitated from 0.75 ml of cells at a density of A 550 =l. 
Lanes 1 and G contain radioactive protein standards (BRL) whose 
molecular weight in kilodaltons is indicated at the left. 
Bacterially synthesized HSA is seen in lane 4 comigrating with the 
68,000 d 1A C-labeled bovine serum albumin standards. Increased 
production of serum albumin in the induced versus repressed culture 
of pHSAl represents higher levels of synthesis of plasmid encoded 
protein rather than a difference in 35 S-methion1ne pool specific 
activities for minimal versus rich media (data not shown). The 
sharp band at 60,000 d is an apparent artifact; this band is seen in 
both induced and repressed pHSAl and control transformants, and 
binds to preimmune (lane 7) as well as anti-HSA IgGs (lanes 2-5). 
The minor 47,000 d band in lane 4 is apparently plasmid encoded and 
may represent a prematurely terminated form of bacterially 
synthesized HSA. 

Figure 3 depicts the nucleotide and amino acid sequence of human serum 
albumin. 

The DMA sequence of the mature protein coding and 3' untranslated 
regions of HSA mFWA were determined from the recombinant plasmid 
pHSAl. The OH A sequence of the prepro peptide coding and 5' 
untranslated regions were determined from the plasmid P-14 (see 
text). Predicted amino acids are included above the DMA sequence 
and are numbered from the first residue of the mature protein. The 
preceding 24 amino acids comprise the prepro peptide. The five 
amino acid residues which disagree with the protein sequence of HSA 
reported by both Dayhoff (9) and Houlon et al_. (12) are underlined. 
The above nucleotide sequence probably does not extend to the true 
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. 5' terminus of HSA mRNA. In the albumin direct expression plasmid 

pHSAl, the mature protein coding region is immediately preceded by 
<. the E. coli trp promoter-operator-leader peptide ribosome binding 

site (36, 37), an artificial Xba l site, and an artificial initation 
5 codon ATG; the prepro region has been excised. The nucleotides 

preceding HSA codon no. 1 in pHSAl read 5 1 -TC ACGTAAAAAGGGTATCTAGATG. 

Detailed Description 

10 (A) Synthesis and Cloning of cOHA . Poly(A)+ RNA was prepared from 

quickly frozen human liver samples obtained from biopsy or from 
cadaver donors by either ribonucleoslde-vanadyl complex (17) or 
guanidinium thlocyanate (18) procedures. cDNA reactions were 
performed essentially as described in (19) employing as primers 

15 either ol igo-deoxynucleotides prepared by the p ho sphotri ester 

method (20) or oligo *dT) 1218 (Collaborative Research). For 
typical cDNA reactions 25-35 M g of poly(A)+ RNA and 40-80 pmol 
of oligonucleotide primer were heated at 90* for 5 minutes in 
50 ntt NaCl. The reaction mixture was brought to final 

20 concentrations of 20 mM Tris HC1 pH 8.3, 20 mM KC1, 8 mM 

MgCl 2 , 30 mM dithi othrei to! , 1 mM dATP, dCTP, dGTP, dTTP 
32 

(plus P-dCTP (Amersham) to follow recovery of product) and 
allowed to anneal at 42*C for 5*. 100 units of AMY reverse 
transcriptase (BRL) were added and incubation continued at 42* 

25 for 45 minutes. Second strand DNA synthesis, SI treatment, 

size selection on poly aery 1 amide gels, deoxy (C) tailing and 
annealing to pBR322 which was cleaved with PstI and deoxy (G) 
tailed, were performed as previously described (21, 22). The 
annealed mixture was used to transform E. coli K-12 strain 294 

30 (23) by a published procedure (24). 
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Screeninn of Recombinant Plasmids with 32p_ lab elled Probes. 



E. coll transformants were grown on LB-agar plates containing 
5ug/ml tetracycline, transferred to nitrocellulose filter paper 
(Schleicher and Schuell. BA85) and tested by hybridization 
using a modification of the In situ colony screening procedure 
(25). 32 P-end labelled (26) ol igodeoxynucleotide fragments 
of from 12 to 16 nucleotides in length were used as direct 
hybridization probes, or 32 P-cDNA probes were synthesized 
from RMA using olfgo(dT) or ol igodeoxynucleotide primers (19). 
Filters were hybridized overnight In 5X Denhardfs solution 
(27), SxSSC, UxSSOl.SM NaCl. 0.1514 Na Citrate) 50 nW Na 
phosphate pH 6.B, 20 ng/ml salmon sperm DNA at temperatures 
ranging from 4 # *to 42* and washed 1n salt concentrations 
varying from 1 to 0.2xSSC plus 0-1 percent SOS at temperatures 
ranging from 4* to 42* depending on the length of the 
32 p-labelled probe (28). Oried filters were exposed to Kodak 
XR-2 X-ray film using DuPont Lightning-Plus Intensifying 
screens at -80*. 

(c) qua Preparation and Restricti on Enzvme Analysis. Plasmid DNA 
was prepared in either large scale (29) or small scale 
("mini prep"; 30) quantities and cleaved by restriction 
endonucleases (New England Biolabs, BRL) following 
manufacturers conditions. Slab gel electrophoresis conditions 
and electroelution of DNA fragments from gels have been 
described (31). 

(D) DNA Sequencing . DNA sequencing was accomplished by both the 
method of Maxam and Gilbert (26) utilizing end-labelled DNA 
fragments and by dideoxy chain termination (32) on single 
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stranded DNA from phage H13 mP7 subclones (33) utilizing 
synthetic oligonucleotide (20) primers. Each region was 
independently sequenced several times. 

5 (E) Construction of 5' End of Albumin Gene for Direct Expression of 

HSA . 10 pg (-16 pmol) of the -1200 bp PstI insert of 

plasmid F-47 was boiled in H^Q for 5 minutes and combined 
17 

with 100 pmol of P-end labelled 5* primer 

( dATGGATGCAC ACAAG ) . The mixture was quenched on ice and 

1° brought to a final volume of 120 M l of 6 mM Tris HC1 pH 7.5, 6 

mM MgC1 2 , 60 mM NaCl , O.S mM dATP, dCTP, dGTP, dTTP at O*. 
10 units of DNA polymerase I Klenow fragment (Boehringer- 
Mannheim) were added and the mixture incubated at 24 * for 5 
hr. Following phenol /chl oroform extraction, the product was 

15 digested with Hpall, electrophoresed in a 5 percent 

polyacryl amide gel. and the desired 450 bp fragment 
electroeluted. The single stranded overhang produced by Xbal 
digestion of the vector pi asm id pLelF A25 (21) was filled in to 
produce blunt DNA ends by adding deoxynucleoside triphosphates 

20 to 10 uH and 10 units DNA polymerase I Klenow fragment to the 

restriction endonuclease reaction mix and incubating at 12V for 
10 minutes. Restriction endonuclease fragments (0.1 - 1 ng in 
approximate molar equality) were annealed and li gated overnight 
at 12* in 20 M l of 50 nM Tris HC1 pH 7.6, 10 mM MgCl 2 , 0.1 mM 

25 EDTA, 5-mM di thiothreitol , 1 mM rATP with 50 units T4 llgase 

(N.E. Biolabs). Further details of pi asmid construction are 
discussed below. 

(F) Protein Analysis . Two ml cultures of recombinant Z. coli 
30 strains were grown in either LB or M9 media plus 5 ug/ml 

tetracycline to densities of A 55Q = 1.0, pelleted, washed. 
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repelleted. and suspended in 2 ml of LB or supplemented M9 (M9 
4 0.2 percent glucose, 1 ug/ml thiamine, 20 ug/ml standard 
amino acids except methionine was 2 ug/ml and tryptophan was 
excluded). Each growth medium also contained 5 ug/ml 
tetracycline and 10D w Ci 3 ^-methionine (HEN; 1200 Ci/mmol). 
After 1 hr incubation at 37 \ bacteria were pelleted, freeze- 
thawed and resuspended in 200 pi 50 ntt Tris HC1 pH 7.5, 0.12 nfi 
NaEDTA then placed on ice for 10 minutes following subsequent 
additions of lysozyme to 1 mg/ml, NP40 to 0.2 percent, and MaCl 
to 0.35 M. The lysate was adjusted to 10 mM MgCl 2 and 
incubated with 50 ug/ml DNase I (Worthington) on ice for 30 
min. Insoluble material was removed by mild centrifugation. 
Samples were immunoprecipitated with rabbit anti-HSA (Cappel 
Labs) and staphylococcal absorbent (Pansorbin; Cal Biochera) as 
described (34), and subjected to SOS polyacryl amide gel 
electrophoresis (35). 

(G) cDHA Cloning. Initial xDUA clones primed with oligo (dT) were 
screened by colony hybridization with both total liver cDNA (to 
identify abundant RNA species containing clones) and with two 
32 P-labelled cDMAs primed from liver mRMA by two sets of four 
11 base oligodeoxynucleotides synthesized to represent the 
possible coding variations for amino acids 546-549 and 294-297 
of HSA. Positive colonies never contained more than about the 
3' 1/2 of the protein coding region of the expected HSA mRMA 
sequence. (The longest of these recombinants was designated 
B-44.) Since existing procedures were unable to directly copy 
an mRHA of the expected size (-2000 bp), synthetic 
oligodeoxynucleotides were prepared to correspond to the 
antimessage strand at regions near the 5' extreme of B-44. 
From the nucleotide sequence of B-44, we constructed a 12 base 
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oligodeoxynucleotide corresponding to amino acids 369-373. 
This was used to prime cDNA synthesis of liver mRNA and produce 
cDMA clones in pBR322 containing the 5' portion of the HSA 
message while overlapping the existing B-44 recombinant. 
5 Approximately 400 resulting clones were screened by colony 

hybridization with a 16 base ol fgodeoxynucleotide fragment 
located slightly upstream in the mRNA sequence we had thus far 
determined. Approximately 40 percent of the colonies 
hybridized to both probes. Many of those colonies which failed 

10 to contain hybridizing plasmids presumably resulted from RHA 
self-priming or priming with contaminating oligo (dT) during 
reverse transcription, or lost the 3' region containing the 
sequence used for screening. "Miniprep" amounts of plasmid DIJA 
from hybridizing colonies were digested with Pstl . Three 

15 recombinant plasmids contained sufficiently large inserts to 
code for the remaifilng 5' portion of the HSA message. Two of 
these (F-15 and F-47) contained the extreme 5 f coding portion 
of the gene but failed to extend back to a Pstl site necessary 
for joining with B-44 to reform the complete albumin gene. 

20 Recombinant F-61 possessed this site but lacked the 5 1 extreme 

end. A three part reconstruction of the entire message 
sequence was possible employing restriction endonuclease sites 
in common with the part length clones F-47, F-61 and B-44 
(Fig. 1). 

25 

An additional cDNA clone extending further 5' was obtained by 
similar ol igodeoxynucleotide primed cDNA synthesis (from a 
primer corresponding to amino acid codons no. 175-179}. 
Although not employed in the construction of the mature HSA 
30 expression plasmid, this cDNA clone (P-14) allowed 

determination of the DMA sequence of the "prepro" peptide 
coding and 5' non-coding regions of the HSA nftNA. 



0073646 



-18- 



The nature HSA raRMA sequence was joined to a vector plasmid for 
direct expression of the mature protein in E. coli via the trp 
promoter-operator. The plasmid pLelF A25 directs the 
expression of human leukocyte interferon A (IFNa2) (21). It 
was digested with Xbal and the cleavage site "filled in tt to 
produce blunt DMA ends with DNA polymerase I Klenow fragment 
and deoxynucleoside triphosphates. After subsequent digestion 
with Pstl, a "vector" fragment was gel purified that contained 
P BR322 sequences and a 300 hp fragment of the E. coli trp 
promoter, operator, and ribosome binding site of the trp leader 
peptide terminating in the artificially blunt ended Xbal 
cleavage site. A 15 base oligodeoxynucleotide was designed to 
contain the initiation codon ATG followed by the 12 nucleotides 
coding for the first four amino acids of HSA as determined by 
DNA sequence analysis of clone F-47. In a process referred to 
as "primer repair", the gene-containing Pstl fragment of F-47 
was denatured, annealed with excess 15-mer and reacted with DNA 
polymerase I Klenow fragment and deoxynucleoside triphosphates. 
This reaction extends a new second strand downstream from the 
annealed oligonucleotide, degrades the single stranded DMA 
upstream of codon number one and then polymerizes upstream 
three nucleotides complementary to ATG. In addition, when this 
product is blunt-end ligated to the prepared vector fragment, 
its initial adenosine residue recreates an Xbal restriction 
site- Following the primer repair reaction, the DNA was 
digested with Hpall and a 450 bp fragment containing the 5' 
portion of the mature albumin gene was gel purified (see Fig. 
1). This fragment was annealed and ligated to the vector 
fragment and to the gel isolated Hpall to PstL portion of F-47 
and used to transform E. coli cells. Diagnostic restriction" 
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endonuc lease digests of plasmid minipreps identified the 
recombinant A-2C which contained the 5' portion of the mature 
albumin coding region li gated properly to the trp promoter- 
operator. For the final steps in assembly, the A-26 plasmid 

5 was digested with Bglll plus PstI and the -4 kb fragment was 

gel purified. This was annealed and ligated to a 390 bp Pstl, 
Bglll partial digestion fragment purified from F-61 and a 1000 
bp Pstl fragment of B-44. Restriction endonuclease analysis of 
resulting transforraants identified plasmids containing the 

10 entire HSA coding sequence properly aligned for direct 

expression of the mature protein. One such recombinant plasmid 
was designated pHSAl. When E. coli containing pHSAl is grown 
in minimal media lacking tryptophan, the cells produce a 
protein which specifically reacts with HSA antibodies and 

15 comi grates with HSA 1n SOS polyacryl amide electrophoresis (Fig. 

2). No such protein is produced by identical recombinants 
grown in rich broth, implying that production in £. coli of the 
putative HSA protein is under control of the trp 
promoter -opera tor as designed. To insure the integrity of the 

20 HSA structural gene 1n the recombinant plasmid, pHSAl was 

subject to DNA sequence analysis. 

(H) DNA Sequence Analysis 

25 The albumin cDNA portion (and surrounding regions) of pHSAl 

were sequenced to completion by both the chemical degradation 
method of Maxam and Gilbert (26) and the dldeoxy chain 
termination procedure employing templates derived from single 
stranded M13 raP7 phage derivatives (32, 33). All nucleotides 

30 were sequenced at least twice. The DNA sequence is shown in 

Fig. 3 along with the predicted amino acid sequence of the HSA 
protein. The DMA sequence farther 5' to the mature HSA coding 
region was also determined from the cDMA clone P-14 and is 
included in Fig. 3. 
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OHA sequence analysis confirmed that the artifical initiation 
codon and the complete mature HSA coding sequence directly 
follows the E. coVLtrp promoter- operator as desired. The ATG 
initiator follows the putative E. coM ribosome binding 
sequence (36) of the trp leader peptide (37) by 9 nucleotides. 

Translation of the DNA sequence of pHSAl predicts a mature HSA 
protein of 585 amino acids. Various published protein 
sequences of HSA disagree at about 20 amino acids. The present 
sequence differs by eleven residues from Houlon et aK (12), 
and by 28 residues from that reported in the Dayhoff catalogue 
(9) credited as arising primarily from Behrens et aU (10) with 
contributions by Houlon et al, (12). Most of these differences 
represent inversions of pairs of adjacent residues or 
glutamine-glutaraic acid disagreements. Only at five of the 585 
residues does our sequence differ from the residue reported by 
both Dayhoff (9) and Moulon et al_. (12). and three of these 
five differences represent glutamine-glutaraic acid interchanges 
(underlined in Figure 3). At all discrepant positions the 
nucleotide sequencing has been carefully rechecked and it is 
unlikely that DMA sequencing errors are the cause of these 
reported differences. The possibility of artifacts introduced 
by cDNA cloning cannot be ruled out. However, other likely 
explanations exist for the amino acid sequence differences 
among various reports. These include changes in amidation 
(affecting gl utamine-gl utamic acid discrimination) occurring 
either in vivo or during protein sequencing (38). Polymorphism 
* in HSA proteins may also account for some differences; over 
twenty genetic variants of HSA have been detected by protein 
electrophoresis (39) but have not yet been analyzed at the 
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araino acid sequence level. It is also worth noting that our 
predicted HSA protein sequence is 585 amino acids long, in 
agreement with Moulon (12) but not Dayhoff (9). The difference 
1s accounted for by the deletion (in ref. 9) of one 
5 phenylalanine (Phe) residue in a Phe-Phe pair at amino acids 

156-157. 

When compared to the DNA sequence of a rat serum albumin cDHA 
clone (16) the present mature HSA sequence shares 74 percent 

10 homology at the nucleotide and 73 percent homology at the amino 

acid level. (The rat SA protein is one amino acid shorter than 
HSA; the carboxy terminal residue of HSA is absent in the rat 
protein.) All 35 cysteine residues are located in identical 
positions In both proteins. The predicted "prepro 11 peptide 

15 region of HSA shares 76 percent nucleotide and 75 percent amino 

acid homology with that reported from the rat cDNA clone (16). 
Interspecies sequence homology is reduced in the portion of the 
3' untranslated region which can be compared (the published rat 
cDNA clone ends before the 3* mRNA terminus). The HSA cDNA 

2 0 contains the hexanucleotide AATAAA 28 nucleotides before the 

site of poly(A) addition. This is a common feature of 
eukaryotlc mRNAs first noted by* Proud foot and Brownlee (40). 

Pharmaceutical Compositions 

25 

The compounds of the present invention can be formulated according to 
known methods to prepare pharmaceutical ly useful compositions, whereby 
the polypeptide hereof is combined in admixture with a pharmaceutical ly 
acceptable carrier vehicle. Suitable vehicles and their formulation are 
30 described 1n Remington's Pharmaceutical Sciences by E.W. Martin, which is 
hereby incorporated by reference. Such compositions will contain an 
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effective amount of the protein hereof -together with a suitable amount of 
vehicle in o*er to prepare phannaceutical ly accept^ compositions 
s „U.b,e for effective administration to the host- One preferred node of 
administration is parenteral. 
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CLAIMS : 



1 . A method of constructing a DNA sequence encoding a 

polypeptide comprising a functional protein or a bioactive 
portion thereof, said DNA sequence being designed for 
insertion together with appropriately positioned transla- 
tional start and stop signals into an expression vector 
under the control of a microbially operable promoter, 
comprising the steps oft 

(a) providing messenger RNA comprising the entire 
coding sequence of said polypeptide, 

(b) obtaining by reverse transcription from the 
messenger RNA of step (a) a series of fragments of 
double stranded cDNA, each of said fragments 

* corresponding in sequence to a portion of said 
coding sequence and thus encoding a portion of 
said polypeptide, wherein said fragments overlap 
in sequence at the respective terminal regions 
thereof, the overlapping portions thereof 
0 containing common restriction endonuclease sites, 

said fragments in totality comprising the entire 
coding sequence of said polypeptide, 

(c) cleaving the fragments of step (b) so as to 
prepare corresponding fragments which, when 

5 properly ligated, encode said polypeptide, and 

(d) ligating the fragments obtained from step (c) . 
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2. A method of constructing a vector for use in 
expressing a polypeptide comprising performing the method of 
claim 1 to produce a product comprising the entire coding 
sequence of said polypeptide, and introducing the product 

5 into a vector under proper reading frame control of an 
expression promoter. 

3. The method according to claim 1 or 2 wherein said 
polypeptide comprises the amino acid sequence of human serum 

10 albumin. 

4 . The method according to claim 3 wherein the poly- 
peptide contains a cleavable conjugate or microbial signal 
protein attached to the N-terminus of the ordinarily first 

15 amino acid of said human serum albumin. 

5. The method according to claim 4 wherein said cleavable 
conjugate is the amino acid methionine. 

20 6. A method according to any preceding claim wherein said 

DNA sequence is the gene encoding human serum albumin. 

7. A DNA sequence consisting essentially of a sequence 
encoding human serum albumin . 

25 

8. A DNA sequence according to claim 7 operably linked 
with a DNA vector capable of effecting the microbial 



-28- 



0073646 



expression of said sequence so as to prepare the corres- 
ponding human serum albumin. 

•9. A replicable microbial expression vehicle capable, in 

5 a transformant microorganism, of expressing the DMA sequence 
according to claim 7 . 

10. A microorganism transformed with the vehicle according 
to claim 9 . 

10 

11. A fermentation culture comprising a transformed 
microorganism according to claim 10 . 

12. The microorganism according to claim 10, obtained by 
15 transforming an E. coli bacterial or a yeast strain. 

13. The plasmid pHSAl. 

14. An E. coli bacterial strain transformed with the 
20 plasmid according to claim 13. 

15. A process which comprises microbially expressing human 
serum albumin in mature form. 

25 16. The use of human serum albumin prepared by the process 
of claim 15 for therapeutic treatment of humans or for 
preparing pharmaceutical compositions useful for such 
treatment - 
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MGVreTCTTCTCGCAATTTCATATAAOTAn 

(Prep re) 

Hat Lys Trp Val Thr Phe He Ser Lev Leu Phe Leu Phe Ser Ser AU Tyr Ser Arg Gly 
ACA ATG AAfi 166 6TA ACC TTT ATT TCC CTT CTT TTT CTC TTT AGC TCG GCT TAT TCC AGG 66T 

KKtturt) 



AU 

GCC 


Phe 

TTT 


AU 

GCT 


Bin Tyr Leu 
CAG TAT CTT 


Lys 
AM 


Thr 
ACA 


Cys 
TGT 


V«1 AU Asp 

GTA GCT GAT 


Thr 
ACA 


Val 

CTT 


AU 
GCA 


Thr Leu Arg 
ACT CTT CG* 


& 


Phe 

TTC 


Leu 

TTG 


Gin His Lys 
CAA CAC AAA 


All 

GCT 


TTT 


His 
CAT 


Asp Asn Glu 

CAT* **T CM A 

wfe rml 6W 


All 

GCC 


Pro 

CCG 


Glu 
GAA 


Leu Leu Phe 

CTC CTT TTC 


Ale 

GCC 


Cys 
T6C 


Leu 

CTG 


Ltu Pro Lys 
TTG CCA AAG 


AU 

GCC 


Ser 
ACT 


Leu 
CTC 


Gin Lys Phe 
CAA AAA TTT 


AU 

GCT 


Glu 
GAG 


Pbe 
TTT 


All Glu Val 
GCA GAA GTT 


LtJ 

CTT 


Glu 
GAA 


Cys 
TGT 


AU Asp Asp 
GCT GAT GAC 


Lys 
AAG 


Glu 
GAA 


Cy* 

TGC 


Cys Glu Lys 
TGT GAA AAA 


& 


Leu 

TTG 


Pro 
CCT 


Ser Leu AU 
TCA TTA GCT 


Phe 

TTC 


Leu 
CTG 


Gly 
GGC 


Het Phe Leu 

ATG TTT TTG 


Lys 
AAG 


inr 
ACA 


Tyr 
TAT 


Tin TVr- Thr- 

uiu mr inr 
GAA ACC ACT 


Glu 
GAA 


Phe 
TTT 


Lys 

AAA 


Pro Leu Val 
CCT CTT GTG 


Tyr 
TAC 


Lys 

AAA 


Phe 

TTC 


Gin Asn Ale 
CAG AAT CCG 


Val 

STC 


Ser 
TCA 


Ars 
AGA 


Asn Leu Gly 
AAC CTA GGA 


Asp 

GAC 


Tyr 
TAT 


Leu 
CTA 


Ser Vel Val 

TCC GTG GTC 


Cys 

TCC 


Cyi 

TGC 


Thr 
ACA 


Glu Ser Leu 
GAG TCC TTG 


Glu 
GAG 


Phe 

TTT 


Asn 
AAT 


AU Glu Thr 
GCT GAA ACA 


Gin 
CAA 


Thr 
ACT 


AU 

GCA 


Leu Val Glu 
CTT GTT GAG 


Phe 

TTC 


AU 

GCA 


AU 

GCT 


Phe Val Glu 
TTT GTA GAG 


Val 

GTT 


AU 

GCT 


All 

GCA 


Ser Gin Ala 
ACT CAA GCT 



Val 
GTG 


Phe 
TTT 


CG? CGA 


Leu 
TTG 


Val 

GTG 


Leu lie 
TTG ATT 


Thr 
ACT 


Glu 
GAA 


50 

Phe AU 
TTT GCA 


Aip 

GAC 


cys 

AAA 


Leu Cys 
TTA TGC 


Glu 
GAG 


Arg 
AGA 


100 
Asn Glu 
AAT GAA 


Val 

GTG 


Het 

ATG 


Cys Thr 
TGC ACT 


IS 


& 


ISO 
Phe Tyr 
TTT TAT 


AU 

GCT 


& 


Lys AU 
AAA GCT 


iSt 


Leu 

CTC 


200 

K» 


is 


Phe 
TTT 


Pro Lys 
CCC AAA 



Glu Thr Tyr Gly Glu Net Alt Asp Cys Cys AU Lys Gin G1 
GAA ACC TAT 66T GAA ATG GCT GAC TGC TGT GCA AAA CAA GJ 

Asp Asp Asn Pro Asn Leu Pro Arg Leu Val Arg Pro Glu Val Asp Val Het Cys Thr 
GAT GAC AAC CCA AAC CTC CCC CGA TTG 6T6 AGA CCA GAG GTT GAT GTG ATG TGC ACT 



U Lys Arc Tyr Lys Ala AU Phe Thr Glu Cys Cys Gin AU 
CT AAA AGG TAT AAA GCT GCT TTT ACA GAA TGT TGC CAA GCT 

IP Glu Leu Aro Asp Glu Gly Lys AU Ser Ser AU Lys Gin 
AT GAA CTT CG& GAT GAA 6GG AAG GCT TCG TCT GCC AAA CAG 

Phe Gly Glu Arg AU Phe Lys Ala Trp Ala Val Ala Arg Leu Ser Gin 

TTT GGA GAA AGA GCT TTC AAA GCA TGC GCA GTG GCT CGC CTG AGC CAG 

250 

n Leu Val Thr Asp Leu Thr- Lys Val His Thr Glu Cys Cys His Gly Asp Leu 
AG TTA GTG ACA GAT CTT ACC AAA GTC CAC ACS GAA TGC TGC CAT GGA GAT CTG 

la Asp Leu Ale Lys Tyr lie Cys Glu Asn Gin Asp Ser He Ser Ser Lys Leu 
CG GAC CTT GCC AAG TAT ATC TGT GAA AAT CAG GAT TCG ATC TCC ACT AAA CTG 



i Asp Phe Val Glu Ser Lys Asp Val Cys Lys Asn Tyr Ala Glu AU Lys Asp Val 
' GAT TTT GTT GAA AGT AAG GAT GTT TGC AAA AAC TAT GCT GAG GCA AAG GAT GTC 

3S0 

• 61 u Tyr Ala Arg Arg Hfs Pro Asp Tyr Ser Val Val Leu Leu Leu Arg Leu Ala 
GAA TAT GCA AGA AGG CAT CCT GAT TAC TCT GTC GTG CTG CTG CTG AGA CTT GCC 

i Glu Lys Cys Cys Ala Ala Ala Asp Pro His Glu Cys Tyr AU Lys Val Phe Asp 
i GAG AAG TGC TGT GCC GCT GCA GAT CCT CAT GAA TGC TAT GCC AAA GTG TTC GAT 

400 



t Arg Arg Pro Cys Pbe Ser Ala Leu Glu Val Asp Glu Thr Tyr Val Pro Lys 
AGG CGA CCA TGC TTT TCA GCT CTG GAA GTC GAT GAA ACA TAC 6TT CCC AAA 



{ CTT TCT GAG AAG GAG AGA CAA ATC AAG AAA 

550 

■ Lys Glu Gin Leu Lys AU Val Het Asp Asp 
I AAA GAG CAA CTG AAA CCT GTT ATG GAT GAT 



GATCAAAAttTTATTCATCTCTTTTCTTTTTC 
GTGCTTCAATTAATAAAAMTGGAAAGMTCTAATA&AGTGCT 

TGGAA6TTCCAGTGTTCTC TCTTAT7C CAC TTCCST A6ACCATTTC TAGTTTCTGTGGGC TAATTAAATAAATCACTAATACTCTTC TAAGTT Poly(A) 
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